Here we get a database called HappyDB that “a corpus of 100,000 crowd-sourced happy moments”. We know many things can make people smile. In this project we merged HappyDB and demographic.csv that contains demographic information of the workers who contributed to the happy moment collection to analyze the differences of happy moments by gender,country and age.I used word cloud, sentiment analysis and topic model to analysis the database and visualize the data.
Step 0: Check and install needed packages. Load the libraries and functions
library(tm)
## Warning: package 'tm' was built under R version 3.4.4
## Loading required package: NLP
library(tidytext)
## Warning: package 'tidytext' was built under R version 3.4.4
library(tidyverse)
## Warning: package 'tidyverse' was built under R version 3.4.4
## -- Attaching packages ---------------------------------- tidyverse 1.2.1 --
## v ggplot2 3.0.0 v purrr 0.2.4
## v tibble 1.4.2 v dplyr 0.7.4
## v tidyr 0.8.0 v stringr 1.2.0
## v readr 1.1.1 v forcats 0.3.0
## Warning: package 'ggplot2' was built under R version 3.4.4
## Warning: package 'tibble' was built under R version 3.4.4
## Warning: package 'tidyr' was built under R version 3.4.4
## Warning: package 'purrr' was built under R version 3.4.4
## Warning: package 'dplyr' was built under R version 3.4.3
## Warning: package 'forcats' was built under R version 3.4.4
## -- Conflicts ------------------------------------- tidyverse_conflicts() --
## x ggplot2::annotate() masks NLP::annotate()
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
library(DT)
## Warning: package 'DT' was built under R version 3.4.4
library(dplyr)
library(stringr)
library(tidyr)
library(janeaustenr)
## Warning: package 'janeaustenr' was built under R version 3.4.4
library(ggplot2)
library(wordcloud)
## Warning: package 'wordcloud' was built under R version 3.4.4
## Loading required package: RColorBrewer
library(stm)
## Warning: package 'stm' was built under R version 3.4.4
## stm v1.3.3 (2018-1-26) successfully loaded. See ?stm for help.
## Papers, resources, and other materials at structuraltopicmodel.com
library(quanteda)
## Warning: package 'quanteda' was built under R version 3.4.4
## Package version: 1.3.4
## Parallel computing: 2 of 4 threads used.
## See https://quanteda.io for tutorials and examples.
##
## Attaching package: 'quanteda'
## The following objects are masked from 'package:tm':
##
## as.DocumentTermMatrix, stopwords
## The following object is masked from 'package:utils':
##
## View
library(reshape2)
##
## Attaching package: 'reshape2'
## The following object is masked from 'package:tidyr':
##
## smiths
library(plotly)
## Warning: package 'plotly' was built under R version 3.4.4
##
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
##
## last_plot
## The following object is masked from 'package:stats':
##
## filter
## The following object is masked from 'package:graphics':
##
## layout
Step 1: Data harvest
import the HappyDB from the website and then clean the text
urlfile<-'https://raw.githubusercontent.com/rit-public/HappyDB/master/happydb/data/cleaned_hm.csv'
hm_data <- read_csv(urlfile)
## Parsed with column specification:
## cols(
## hmid = col_integer(),
## wid = col_integer(),
## reflection_period = col_character(),
## original_hm = col_character(),
## cleaned_hm = col_character(),
## modified = col_character(),
## num_sentence = col_integer(),
## ground_truth_category = col_character(),
## predicted_category = col_character()
## )
corpus <- VCorpus(VectorSource(hm_data$cleaned_hm))%>%
tm_map(content_transformer(tolower))%>%
tm_map(removePunctuation)%>%
tm_map(removeNumbers)%>%
tm_map(removeWords, character(0))%>%
tm_map(stripWhitespace)
Next we stem the words and convert them to a “tidy” object. We build a new dataframe called new_df that only contains the hmid,wid and text sentence and then count the number of comments made by each worker(the reason why we do this is we want to count the ratio of each word later).Then add column ratio to dataframe new_df
stemmed <- tm_map(corpus, stemDocument) %>%
tidy() %>%
select(text)
new_df <- data_frame(hm_data$hmid, hm_data$wid, stemmed$text)
colnames(new_df) <- c('hmid', 'wid', 'text')
comment_count <- new_df %>% group_by(wid) %>% summarise(comment_num = n())
new_df <- merge(new_df, comment_count, by = 'wid')
Then splite each sentence into word and We remove stopwords provided by the “tidytext” package and add some custom stopword
dict <- tidy(corpus) %>%
select(text) %>%
unnest_tokens(dictionary, text)
data("stop_words")
word <- c("happy","ago","yesterday","lot","today","months","month",
"happier","happiest","last","week","past")
stop_words <- stop_words %>%
bind_rows(mutate(tibble(word), lexicon = "updated"))
Lastly, we complete the stems by picking the corresponding word with the highest frequency.
completed <- new_df %>%
unnest_tokens(stems, text) %>%
bind_cols(dict) %>%
anti_join(stop_words, by = c("dictionary" = "word"))
completed <- completed %>%
group_by(stems) %>%
count(dictionary) %>%
mutate(word = dictionary[which.max(n)]) %>%
ungroup() %>%
select(stems, word) %>%
distinct() %>%
right_join(completed) %>%
select(-stems)
## Warning: package 'bindrcpp' was built under R version 3.4.3
## Joining, by = "stems"
Because we want to do furture investigation then we import demographic.csv,including country,gender,age from the same database. For our further investigation, we combine file completed and demo.list together into one single file.
demo <- read_csv('https://raw.githubusercontent.com/rit-public/HappyDB/master/happydb/data/demographic.csv')
## Parsed with column specification:
## cols(
## wid = col_integer(),
## age = col_character(),
## country = col_character(),
## gender = col_character(),
## marital = col_character(),
## parenthood = col_character()
## )
merge_df <- merge(completed, demo, by = 'wid')
step 2: sentiments analysis and data visualization
step2.1 select words using sentiment lexicons “bing”
There are many sentiment lexicons available on the internet. So first Let’s look at the words with a joy score from the bing lexicon. What are the most common positive and negative words inner join with our database? We can use wordcloud to plot it. From the result we can see “favorite”,“nice”,“won”,“beautiful”,“amazing”,“enjoyed”,“excited”,“awesome”,“bonus”,“free” are the top 10 positive words that mention mostly by the people. Combined with my personal experience I will also feel happily when I get my “favorite” gift, when I “enjoyed” a “exciting” films, when I get a “free meal”" and so on.
bing <- get_sentiments("bing")
j_word <- merge_df %>%
inner_join(bing) %>%
count(word, sentiment, sort = TRUE)
## Joining, by = "word"
j_wid_word <- merge_df %>%
inner_join(bing) %>%
group_by(wid, word, sentiment) %>%
summarise(count = n())
## Joining, by = "word"
j_wid_word_final <- merge(j_wid_word, comment_count, by = 'wid')
j_wid_word_final$ratio <- j_wid_word_final$count / j_wid_word_final$comment_num
library(wordcloud)
library(reshape2)
j_word %>%
acast(word ~ sentiment, value.var = "n", fill = 0) %>%
comparison.cloud(colors = c("#458B74", "#FF7F50"),
title.size=1.5, max.words = 150)
I also plot top10 positive words and top10 negative words. These histograms are more straghtforward to see.
j_word %>%
group_by(sentiment) %>%
top_n(10) %>%
ungroup() %>%
mutate(word = reorder(word, n)) %>%
ggplot(aes(word, n, fill = sentiment)) +
geom_col(show.legend = FALSE) +
facet_wrap(~sentiment, scales = "free_y") +
labs(y = "Contribution to sentiment",
x = NULL) +
coord_flip() + theme_minimal(base_size = 7)
## Selecting by n
step2.2 analyze the frequency of the topten words mention by different age level
further more I am curious whether I could analyze the frequency of the topten words mention by different age group. Maybe the results will reflect the living habits of differnet age group.So I draw two graphs to show the results.The first one is the age for every year and the second one I classify the age [17,90] into 8 levels. From the first graph we can see straighforward results in every age. The second one is very easy to compare young,middle and old age.
merge_df$age <- as.numeric(merge_df$age)
## Warning: NAs introduced by coercion
age_count <- merge_df %>% group_by(age) %>%
summarise(count = n()) %>% filter(age >= 5) %>% filter(age <= 200)
j_age_word <- merge_df %>%
inner_join(bing) %>%
group_by(age, word, sentiment) %>%
summarise(count = n())
## Joining, by = "word"
j_age_word_final <- merge(j_age_word, age_count, by = 'age')
j_age_word_final$ratio <- j_age_word_final$count.x / j_age_word_final$count.y
pos = c("favorite","nice","won","beautiful","amazing","enjoyed","excited","awesome","bonus","free")
neg = c("bad","break","anxiety","difficult","lost","aggression","cold","hard","afraid","broke")
j_age_word_final_topten_pos <- j_age_word_final %>% filter(word %in% pos)
j_age_word_final_topten_neg <- j_age_word_final %>% filter(word %in% neg)
g2 <- ggplot(j_age_word_final_topten_pos, aes(age, ratio,fill = word)) +
geom_bar(stat = "identity", show.legend = FALSE)
ggplotly(g2)
g3 <- ggplot(j_age_word_final_topten_neg, aes(age, ratio,fill = word)) +
geom_bar(stat = "identity", show.legend = FALSE)
ggplotly(g3)
library(Hmisc)
## Warning: package 'Hmisc' was built under R version 3.4.4
## Loading required package: lattice
##
## Attaching package: 'lattice'
## The following object is masked from 'package:stm':
##
## cloud
## Loading required package: survival
## Loading required package: Formula
##
## Attaching package: 'Hmisc'
## The following object is masked from 'package:plotly':
##
## subplot
## The following objects are masked from 'package:dplyr':
##
## src, summarize
## The following objects are masked from 'package:base':
##
## format.pval, units
agebin <- cut2(age_count$age, cuts=c(30,40,50,60,70,80,90))
age_count$agebin <- agebin
agebin_count <- aggregate(age_count$count, by=list(Category=age_count$agebin), FUN=sum)
merge_df <- merge_df %>% filter(age >= 5 & age <= 200)
agebin2 <- cut2(merge_df$age, cuts=c(30,40,50,60,70,80,90))
merge_df$agebin <- agebin2
j_agebin_word <- merge_df %>%
inner_join(bing) %>%
group_by(agebin, word, sentiment) %>%
summarise(count = n())
## Joining, by = "word"
colnames(agebin_count) <- c("agebin","count")
j_agebin_word_final <- merge(j_agebin_word, agebin_count, by = 'agebin')
try <- data_frame(j_agebin_word_final$count.x)
try2 <- data_frame(j_agebin_word_final$count.y)
tryall <- cbind(try, try2)
colnames(tryall) <- c("try","try2")
tryall$ratio <- tryall$try /tryall$try2
j_agebin_word_final$ratio <- tryall$ratio
pos = c("favorite","nice","won","beautiful","amazing","enjoyed","excited","awesome","bonus","free")
neg = c("bad","break","anxiety","difficult","lost","aggression","cold","hard","afraid","broke")
j_agebin_word_final_topten_pos <- j_agebin_word_final %>% filter(word %in% pos)
j_agebin_word_final_topten_neg <- j_agebin_word_final %>% filter(word %in% neg)
g4 <- ggplot(j_agebin_word_final_topten_pos, aes(agebin, ratio,fill = word)) +
geom_bar(stat = "identity", show.legend = FALSE)
ggplotly(g4)
g5 <- ggplot(j_agebin_word_final_topten_neg, aes(agebin, ratio,fill = word)) +
geom_bar(stat = "identity", show.legend = FALSE)
ggplotly(g5)
From the result we see that “bonus” and “favorite” are more mentioned by people over 50 years old. And “won” are more mentioned by young people(except level [80,90]).It is make sense because young people like competitons and competitons actually exist in their life such as work and study. So when they won something they will feel so happy. For old people they usually buy clothing and others just follow their heart(buy their favority thing).They won’t concern more about whether I can wear this to school or whether it is too expensive and so on.But old people consider more about “bonus”. They work in a company for a long time, so they will put more attention on their bonus especially for people who will retire.
step2.3 analyze the frequency of the topten words mention by gender
Further more I am wondering whether I could analyze the frequency of the topten words mention by female and male. The results will reflect the living habits by gender. So I plot two graphs, the first one is positive words and second one is negative words.
merge_df_gender <- merge(hm_data, demo, by = 'wid')
gender_count <- merge_df_gender %>% group_by(gender) %>% summarise(count = n())
g = c("f","m")
gender_count <- gender_count %>% filter(gender %in% g)
merge_df <- merge_df %>% filter(gender %in% g)
j_gender_word <- merge_df %>%
inner_join(bing) %>%
group_by(gender, word, sentiment) %>%
summarise(count = n())
## Joining, by = "word"
j_gender_word_final <- merge(j_gender_word, gender_count, by = 'gender')
j_gender_word_final$ratio <- j_gender_word_final$count.x / j_gender_word_final$count.y
j_gender_word_final_topten_pos <- j_gender_word_final %>% filter(word %in% pos)
j_gender_word_final_topten_neg <- j_gender_word_final %>% filter(word %in% neg)
library(plotly)
g6 <- ggplot(j_gender_word_final_topten_pos, aes(gender, ratio,fill = word)) +
geom_bar(stat = "identity", show.legend = FALSE)
ggplotly(g6)
g7 <- ggplot(j_gender_word_final_topten_neg, aes(gender, ratio,fill = word)) +
geom_bar(stat = "identity", show.legend = FALSE)
ggplotly(g7)
From the graphs we can totally analysis the characters difference between female and male. On the view of male, they mention “amazing” and “aewsome” more frequently. This fits to male’s characters. Male like to explore newness and mystery things, So “amazing”" and “aewsome”" things will make them more happy. But for female, they will buy their favority things instead of newness things. So women always buy the same brand’s clothing and food. Also free things will make women more happily.
step2.4 analyze the frequency of the topten words between parenthood
“most children report painful feelings about their parents’ divorce, and a significant minority of children suffer extended and prolonged symptomatology related to parental divorce that may include both internalizing and externalizing problems” mentioned by Catherine M Lee in the paper “Children’s reactions to parental separation and divorce”. So this causes my curiosity, I want to use data to explore the emotion change. Are people whose parents are divorced more negative than people whose parents relationships are healthy?
merge_df_parenthood <- merge(hm_data, demo, by = 'wid')
parenthood_count <- merge_df_parenthood %>% group_by(parenthood) %>% summarise(count = n()) %>% filter(count != 78)
j_parenthood_word <- merge_df %>%
inner_join(bing) %>%
group_by(parenthood, word, sentiment) %>%
summarise(count = n())
## Joining, by = "word"
j_parenthood_word_final <- merge(j_parenthood_word, parenthood_count, by = 'parenthood')
j_parenthood_word_final$ratio <- j_parenthood_word_final$count.x / j_parenthood_word_final$count.y
j_parenthood_word_final_topten_pos <- j_parenthood_word_final %>% filter(word %in% pos)
j_parenthood_word_final_topten_neg <- j_parenthood_word_final %>% filter(word %in% neg)
g12 <- ggplot(j_parenthood_word_final_topten_pos, aes(parenthood, ratio,fill = word)) +
geom_bar(stat = "identity", show.legend = FALSE)
ggplotly(g12)
People whose parenthood status are healthy mention more positive words than those whose parenthood status are broken. Because the number of people in level “n” and level “y” are different, So we use ratio= # of this words mention by people in this level/# of peopel in this level. This shows children whose parents are divorced have a emotion change.
Summary: We find many characters for differnt group people. Those findings give us many business value. For example, When we send email of our company’s products, we can send more fancy styles to male and send more normal styles to women.We also could give some free gifts to women when they shopping at store.This will make them more happile and maybe will ignite their shopping desire.For older man we can give them more points that can redeem some thing when they shopping at our store. This is similar to “bonus”.
step 3: Topic model In this step, I want to apply some topic models. So first I delete all emotion words from the dataset. I want to put more attention on the events that make people happily.I also put age[17,90] into 8 levels. It is more convenience to compare with each other in this way. I want to explore what events make people happily in each age level.
## Joining, by = "word"
## Beginning Spectral Initialization
## Calculating the gram matrix...
## Finding anchor words...
## ........
## Recovering initialization...
## .............................
## Initialization complete.
## ........
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 1 (approx. per word bound = -7.789)
## ........
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 2 (approx. per word bound = -7.734, relative change = 7.002e-03)
## ........
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 3 (approx. per word bound = -7.653, relative change = 1.047e-02)
## ........
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 4 (approx. per word bound = -7.627, relative change = 3.391e-03)
## ........
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 5 (approx. per word bound = -7.601, relative change = 3.427e-03)
## Topic 1: beat, straight, traveling, bogo, chosen
## Topic 2: apartment, straight, traveling, bogo, chosen
## Topic 3: time, home, house, husband, im
## Topic 4: birthday, account, afternoon, amazon, anniversary
## Topic 5: day, account, afternoon, amazon, anniversary
## Topic 6: friend, account, afternoon, amazon, anniversary
## Topic 7: friends, home, house, husband, im
## Topic 8: accumulation, acquire, activate, ad, adding
## ........
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 6 (approx. per word bound = -7.594, relative change = 8.924e-04)
## ........
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 7 (approx. per word bound = -7.594, relative change = 8.966e-05)
## ........
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 8 (approx. per word bound = -7.591, relative change = 3.080e-04)
## ........
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 9 (approx. per word bound = -7.557, relative change = 4.514e-03)
## ........
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 10 (approx. per word bound = -7.522, relative change = 4.699e-03)
## Topic 1: beat, bogo, chosen, grad, boba
## Topic 2: apartment, straight, traveling, bogo, chosen
## Topic 3: time, beaches, bedside, prayer, adopt
## Topic 4: birthday, account, afternoon, amazon, anniversary
## Topic 5: day, accident, apple, account, afternoon
## Topic 6: friend, ive, account, afternoon, amazon
## Topic 7: friends, home, house, husband, im
## Topic 8: accumulation, acquire, activate, ad, adding
## ........
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 11 (approx. per word bound = -7.505, relative change = 2.246e-03)
## ........
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 12 (approx. per word bound = -7.498, relative change = 9.563e-04)
## ........
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 13 (approx. per word bound = -7.490, relative change = 1.049e-03)
## ........
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 14 (approx. per word bound = -7.473, relative change = 2.216e-03)
## ........
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 15 (approx. per word bound = -7.449, relative change = 3.257e-03)
## Topic 1: beat, paper, attributing, automated, calamari
## Topic 2: apartment, bogo, chosen, grad, paper
## Topic 3: time, awareness, bingo, bunking, choongil
## Topic 4: birthday, employer, sun, ticket, christmas
## Topic 5: day, home, house, husband, im
## Topic 6: friend, surprisingly, wasnt, cigarette, compound
## Topic 7: friends, beaches, bedside, prayer, giggles
## Topic 8: accumulation, acquire, activate, ad, adding
## ........
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 16 (approx. per word bound = -7.439, relative change = 1.322e-03)
## ........
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 17 (approx. per word bound = -7.436, relative change = 3.873e-04)
## ........
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 18 (approx. per word bound = -7.433, relative change = 4.130e-04)
## ........
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 19 (approx. per word bound = -7.426, relative change = 8.821e-04)
## ........
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 20 (approx. per word bound = -7.420, relative change = 8.768e-04)
## Topic 1: beat, animals, attended, attention, aunt
## Topic 2: apartment, paper, attributing, automated, calamari
## Topic 3: time, awareness, bingo, bunking, choongil
## Topic 4: birthday, employer, sun, ticket, absolutely
## Topic 5: day, home, house, husband, im
## Topic 6: friend, surprisingly, wasnt, cigarette, compound
## Topic 7: friends, basis, bath, brazilian, bridge
## Topic 8: accumulation, acquire, activate, ad, adding
## ........
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 21 (approx. per word bound = -7.417, relative change = 4.006e-04)
## ........
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 22 (approx. per word bound = -7.410, relative change = 8.949e-04)
## ........
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 23 (approx. per word bound = -7.379, relative change = 4.251e-03)
## ........
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 24 (approx. per word bound = -7.362, relative change = 2.208e-03)
## ........
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 25 (approx. per word bound = -7.362, relative change = 1.563e-05)
## Topic 1: beat, animals, attended, attention, aunt
## Topic 2: apartment, paper, attributing, automated, calamari
## Topic 3: time, adopt, alerting, bread, bucks
## Topic 4: birthday, ticket, employer, sun, absolutely
## Topic 5: day, home, house, husband, im
## Topic 6: friend, nap, wake, surprisingly, wasnt
## Topic 7: friends, aced, angeles, badge, boxes
## Topic 8: acrylic, add, additional, adult, advisor
## ........
## Completed E-Step (0 seconds).
## Completed M-Step.
## Model Converged
## A topic model with 8 topics, 8 documents and a 2928 word dictionary.
## Topic 1 Top Words:
## Highest Prob: beat, animals, attended, attention, aunt, baking, didnt
## FREX: beat, animals, attended, attention, aunt, baking, didnt
## Lift: beat, animals, attended, attention, aunt, baking, didnt
## Score: beat, animals, attended, attention, aunt, baking, didnt
## Topic 2 Top Words:
## Highest Prob: apartment, paper, attributing, automated, calamari, leave, moments
## FREX: apartment, paper, attributing, automated, calamari, leave, moments
## Lift: apartment, straight, traveling, bogo, chosen, grad, paper
## Score: apartment, straight, traveling, bogo, chosen, grad, boba
## Topic 3 Top Words:
## Highest Prob: time, adopt, alerting, bread, bucks, chops, collections
## FREX: time, awareness, bingo, bunking, choongil, cleaning, committed
## Lift: aquariums, argue, beans, blowjob, boys, brightened, catholic
## Score: time, aquariums, argue, beans, blowjob, boys, brightened
## Topic 4 Top Words:
## Highest Prob: birthday, ticket, employer, sun, absolutely, academic, ace
## FREX: birthday, employer, sun, ticket, aid, alley, angst
## Lift: birthday, editing, puzzles, selling, uber, wifes, employer
## Score: birthday, editing, puzzles, selling, uber, wifes, employer
## Topic 5 Top Words:
## Highest Prob: day, home, house, husband, im, job, life
## FREX: day, home, house, husband, im, job, life
## Lift: day, home, house, husband, im, job, life
## Score: day, ive, accident, apple, home, house, husband
## Topic 6 Top Words:
## Highest Prob: friend, nap, wake, surprisingly, wasnt, cigarette, compound
## FREX: friend, cigarette, compound, facial, initiative, michigan, packing
## Lift: abhai, carrying, cheerleading, choosing, ctrlfa, decision, finch
## Score: friend, abhai, carrying, cheerleading, choosing, ctrlfa, decision
## Topic 7 Top Words:
## Highest Prob: friends, aced, angeles, badge, boxes, ceo, choices
## FREX: friends, aced, angeles, badge, boxes, ceo, choices
## Lift: basis, bath, brazilian, bridge, canadian, celsius, central
## Score: friends, basis, bath, brazilian, bridge, canadian, celsius
## Topic 8 Top Words:
## Highest Prob: acrylic, add, additional, adult, advisor, affected, affections
## FREX: acrylic, add, additional, adult, advisor, affected, affections
## Lift: aa, abt, actions, activity, additions, adjusting, adobe
## Score: amde, acrylic, add, additional, adult, advisor, affected
From the result we can see the topic words for yong people[17,30] are “animals”,“beat”,“attention”…… For middle age[30,50] the topic words are “aparment”,“attributing”,“traveling”…… For old people[50,70] the topic words are “birthday”,“home”,“husband”…… For age[70,90], the topic words are “friends”,“badge”……
Summary: When people are young,they won’t consider more about reality. At this age, young people will consider more about how to enjoy life, so they will feel happily maybe when we adopt a animal;maybe when we get someone’s attention. But When they are at middle age, they will have a family, So,They will put more attention on reality. We will feel happy when we buy a new aparment or travel to someplace. When people retire they will foces more on themselves, such as their home and their health.